In classic reinforcement learning algorithms, agents make decisions at discrete and fixed time intervals. The physical duration between one decision and the next becomes a critical hyperparameter. When this duration is too short, the agent needs to make many decisions to achieve its goal, aggravating the problem's difficulty. But when this duration is too long, the agent becomes incapable of controlling the system. Physical systems, however, do not need a constant control frequency. For learning agents, it is desirable to operate with low frequency when possible and high frequency when necessary. We propose a framework called Continuous-Time Continuous-Options (CTCO), where the agent chooses options as sub-policies of variable durations. Such options are time-continuous and can interact with the system at any desired frequency providing a smooth change of actions. The empirical analysis shows that our algorithm is competitive w.r.t. other time-abstraction techniques, such as classic option learning and action repetition, and practically overcomes the difficult choice of the decision frequency.
translated by 谷歌翻译
政策梯度定理(Sutton等,2000)规定了目标政策下的累积折扣国家分配以近似梯度。实际上,基于该定理的大多数算法都打破了这一假设,引入了分布转移,该分配转移可能导致逆转溶液的收敛性。在本文中,我们提出了一种新的方法,可以从开始状态重建政策梯度,而无需采取特定的采样策略。可以根据梯度评论家来简化此形式的策略梯度计算,由于梯度的新钟声方程式,可以递归估算。通过使用来自差异数据流的梯度评论家的时间差异更新,我们开发了第一个以无模型方式避开分布变化问题的估计器。我们证明,在某些可实现的条件下,无论采样策略如何,我们的估计器都是公正的。我们从经验上表明,我们的技术在存在非政策样品的情况下实现了卓越的偏见变化权衡和性能。
translated by 谷歌翻译
Softmax政策的政策梯度(PG)估计与子最佳饱和初始化无效,当密度集中在次良动作时发生。从策略初始化或策略已经收敛后发生的环境的突然变化可能会出现次优策略饱和度,并且SoftMax PG估计器需要大量更新以恢复有效的策略。这种严重问题导致高样本低效率和对新情况的适应性差。为缓解此问题,我们提出了一种新的政策梯度估计,用于软MAX策略,该估计在批评中利用批评中的偏差和奖励信号中存在的噪声来逃避策略参数空间的饱和区域。我们对匪徒和古典MDP基准测试任务进行了分析和实验,表明我们的估算变得更加坚固,以便对政策饱和度更加强大。
translated by 谷歌翻译
尽管政策梯度方法的普及日益越来越大,但它们尚未广泛用于样品稀缺应用,例如机器人。通过充分利用可用信息,可以提高样本效率。作为强化学习中的关键部件,奖励功能通常仔细设计以引导代理商。因此,奖励功能通常是已知的,允许访问不仅可以访问标量奖励信号,而且允许奖励梯度。为了从奖励梯度中受益,之前的作品需要了解环境动态,这很难获得。在这项工作中,我们开发\ Textit {奖励政策梯度}估计器,这是一种新的方法,可以在不学习模型的情况下整合奖励梯度。绕过模型动态允许我们的估算器实现更好的偏差差异,这导致更高的样本效率,如经验分析所示。我们的方法还提高了在不同的Mujoco控制任务上的近端策略优化的性能。
translated by 谷歌翻译
在许多领域,包括强化学习和控制在内的许多领域,从一系列高维观测中学习或识别动力学是一个困难的挑战。最近通过潜在动力学从生成的角度研究了这个问题:将高维观测结果嵌入到较低维的空间中,可以在其中学习动力学。尽管取得了一些成功,但尚未将潜在动力学模型应用于现实世界的机器人系统,在这些机器人系统中,学习的表示形式必须适合各种感知混杂和噪声源。在本文中,我们提出了一种共同学习潜在状态表示的方法以及在感知困难条件下的长期计划和闭环控制的相关动力。作为我们的主要贡献,我们描述了我们的表示如何能够通过检测新颖或分布(OOD)输入来捕获测试时间的异质或输入特异性不确定性的概念。我们介绍了有关两个基于图像的任务的预测和控制实验的结果:一个模拟的摆平衡任务和实现任务的现实世界机器人操纵器。我们证明,与仅在不同程度的输入降解的情况下,我们的模型可产生更准确的预测,并表现出改善的控制性能。
translated by 谷歌翻译
Rapid advancements in collection and dissemination of multi-platform molecular and genomics data has resulted in enormous opportunities to aggregate such data in order to understand, prevent, and treat human diseases. While significant improvements have been made in multi-omic data integration methods to discover biological markers and mechanisms underlying both prognosis and treatment, the precise cellular functions governing these complex mechanisms still need detailed and data-driven de-novo evaluations. We propose a framework called Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Genomic Data (fiBAG), that allows simultaneous identification of upstream functional evidence of proteogenomic biomarkers and the incorporation of such knowledge in Bayesian variable selection models to improve signal detection. fiBAG employs a conflation of Gaussian process models to quantify (possibly non-linear) functional evidence via Bayes factors, which are then mapped to a novel calibrated spike-and-slab prior, thus guiding selection and providing functional relevance to the associations with patient outcomes. Using simulations, we illustrate how integrative methods with functional calibration have higher power to detect disease related markers than non-integrative approaches. We demonstrate the profitability of fiBAG via a pan-cancer analysis of 14 cancer types to identify and assess the cellular mechanisms of proteogenomic markers associated with cancer stemness and patient survival.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Data scarcity is a notable problem, especially in the medical domain, due to patient data laws. Therefore, efficient Pre-Training techniques could help in combating this problem. In this paper, we demonstrate that a model trained on the time direction of functional neuro-imaging data could help in any downstream task, for example, classifying diseases from healthy controls in fMRI data. We train a Deep Neural Network on Independent components derived from fMRI data using the Independent component analysis (ICA) technique. It learns time direction in the ICA-based data. This pre-trained model is further trained to classify brain disorders in different datasets. Through various experiments, we have shown that learning time direction helps a model learn some causal relation in fMRI data that helps in faster convergence, and consequently, the model generalizes well in downstream classification tasks even with fewer data records.
translated by 谷歌翻译
智能仪表测量值虽然对于准确的需求预测至关重要,但仍面临一些缺点,包括消费者的隐私,数据泄露问题,仅举几例。最近的文献探索了联合学习(FL)作为一种有前途的隐私机器学习替代方案,该替代方案可以协作学习模型,而无需将私人原始数据暴露于短期负载预测中。尽管有着美德,但标准FL仍然容易受到棘手的网络威胁,称为拜占庭式攻击,这是由错误和/或恶意客户进行的。因此,为了提高联邦联邦短期负载预测对拜占庭威胁的鲁棒性,我们开发了一个最先进的基于私人安全的FL框架,以确保单个智能电表的数据的隐私,同时保护FL的安全性模型和架构。我们提出的框架利用了通过符号随机梯度下降(SignsGD)算法的梯度量化的想法,在本地模型培训后,客户仅将梯度的“符号”传输到控制中心。当我们通过涉及一组拜占庭攻击模型的基准神经网络的实验突出显示时,我们提出的方法会非常有效地减轻此类威胁,从而优于常规的FED-SGD模型。
translated by 谷歌翻译
身份验证系统容易受到模型反演攻击的影响,在这种攻击中,对手能够近似目标机器学习模型的倒数。生物识别模型是这种攻击的主要候选者。这是因为反相生物特征模型允许攻击者产生逼真的生物识别输入,以使生物识别认证系统欺骗。进行成功模型反转攻击的主要限制之一是所需的训练数据量。在这项工作中,我们专注于虹膜和面部生物识别系统,并提出了一种新技术,可大大减少必要的训练数据量。通过利用多个模型的输出,我们能够使用1/10进行模型反演攻击,以艾哈迈德和富勒(IJCB 2020)的训练集大小(IJCB 2020)进行虹膜数据,而Mai等人的训练集大小为1/1000。 (模式分析和机器智能2019)的面部数据。我们将新的攻击技术表示为结构性随机,并损失对齐。我们的攻击是黑框,不需要了解目标神经网络的权重,只需要输出向量的维度和值。为了显示对齐损失的多功能性,我们将攻击框架应用于会员推理的任务(Shokri等,IEEE S&P 2017),对生物识别数据。对于IRIS,针对分类网络的会员推断攻击从52%提高到62%的准确性。
translated by 谷歌翻译